[SPARK-39859][SQL] Support v2 DESCRIBE TABLE EXTENDED for columns#40058
[SPARK-39859][SQL] Support v2 DESCRIBE TABLE EXTENDED for columns#40058huaxingao wants to merge 2 commits intoapache:masterfrom
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1 from my side.
cc @cloud-fan , @viirya , @sunchao , @parthchandra
| if (colStats.get.avgLen().isPresent) { | ||
| rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString) | ||
| } else { | ||
| rows += toCatalystRow("max_col_len", "NULL") | ||
| } |
There was a problem hiding this comment.
| if (colStats.get.avgLen().isPresent) { | |
| rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString) | |
| } else { | |
| rows += toCatalystRow("max_col_len", "NULL") | |
| } | |
| if (colStats.get.maxLen().isPresent) { | |
| rows += toCatalystRow("max_col_len", colStats.get.maxLen().getAsLong.toString) | |
| } else { | |
| rows += toCatalystRow("max_col_len", "NULL") | |
| } |
| r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match { | ||
| case s: SupportsReportStatistics => | ||
| val stats = s.estimateStatistics() | ||
| Some(stats.columnStats().get(FieldReference.column(c.name))) |
There was a problem hiding this comment.
Is columnStats case-sensitive or not?
There was a problem hiding this comment.
I think this is controlled by SQLConf.CASE_SENSITIVE
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED for columns ### Why are the changes needed? DS v1/v2 command parity ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #40058 from huaxingao/describe_col. Authored-by: huaxingao <huaxin_gao@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit ebab0ef) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Merged to master/3.4 for Apache Spark 3.4.0. Thank you, @huaxingao and @viirya . cc @MaxGekk since he filed SPARK-39859 originally. |
|
Thanks @dongjoon-hyun @viirya |
| case c: Attribute => | ||
| DescribeColumnExec(output, c, isExtended) :: Nil | ||
| val colStats = | ||
| r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match { |
There was a problem hiding this comment.
this may not be very cheap. Shall we only do it if isExtended is true?
There was a problem hiding this comment.
and what if the table is not readable (e.g. write-only)? We should not fail but show no column stats.
There was a problem hiding this comment.
BTW, I think it's cleaner to pass v2 Table to DescribeColumnExec and move this code block to DescribeColumnExec
There was a problem hiding this comment.
@cloud-fan Thanks for your comments. I will have a follow-up to fix this and also move the test to parent suite.
| } | ||
|
|
||
| // TODO(SPARK-39859): Support v2 `DESCRIBE TABLE EXTENDED` for columns | ||
| test("describe extended (formatted) a column") { |
There was a problem hiding this comment.
don't have to be done immediately, but it's better to move this test to the parent suite, to make sure v1 and v2 commands have the same behavior.
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED for columns ### Why are the changes needed? DS v1/v2 command parity ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#40058 from huaxingao/describe_col. Authored-by: huaxingao <huaxin_gao@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit ebab0ef) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED with table stats ### Why are the changes needed? Similar to #40058, make DS v1/v2 command parity, e.g. DESC EXTENDED table | col_name | data_type | comment | |-------------------|---------------------------|------------| | ... | ... | ... | | Statistics | 864 bytes, 2 rows | | | ... | ... | ... | ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add test `describe extended table with stats` ### Was this patch authored or co-authored using generative AI tooling? No Closes #44535 from Zouxxyy/dev/desc-table-stats. Lead-authored-by: zouxxyy <zouxinyu.zxy@alibaba-inc.com> Co-authored-by: Zouxxyy <zouxxyy@qq.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Support v2 DESCRIBE TABLE EXTENDED for columns
Why are the changes needed?
DS v1/v2 command parity
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT